Finding, Assessing, and Integrating Statistical Sources for Data Mining

نویسندگان

  • Karin Becker
  • Xiaojie Tan
  • Shiva Jahangiri
  • Craig A. Knoblock
چکیده

As the knowledge discovery process has been widely applied in a variety of domains, there is a growing opportunity to use the Linked Open Data (LOD) cloud as a primary data source for knowledge discovery. The tasks of finding the relevant data from various sources and then using that data for the desired analysis are the key challenges. There is a striking increase on the availability of statistical data and indicators (e.g. social, economic) in the LOD, and the Cube ontology has become the de facto standard for their description according to a multi-dimensional model. In this paper we discuss a detailed scenario for using the LOD as a primary source of data for building analysis models in the Peacebuilding domain. Next, we present an approach to finding potentially relevant cube datasets in the LOD cloud, assessing their compatibility, and then integrating the compatible datasets to enable the application of data

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LiDDM: A Data Mining System for Linked Data

In today’s scenario, the quantity of linked data is growing rapidly. The data includes ontologies, governmental data, statistics and so on. With more and more sources publishing the data, the amount of linked data is becoming enormous. The task of obtaining the data from various sources, integrating and fine-tuning the data for desired statistical analysis assumes prominence. So there is need o...

متن کامل

Integrating AHP and data mining for effective retailer segmentation based on retailer lifetime value

Data mining techniques have been used widely in the area of customer relationship management (CRM). In this study, we have applied data mining techniques to address a problem in business-to-business (B2B) setting. In a manufacturer-retailer-consumer chain, a manufacturer should improve its relationship with retailers to continue its business. Segmentation is a useful tool for identifying groups...

متن کامل

Ratio Rule Mining from Multiple Data Sources

Both multiple source data mining and streaming data mining problems have attracted much attention in the past decade. In contrast to traditional association-rule mining, to capture the quantitative association knowledge, a new paradigm called Ratio Rule (RR) was proposed recently. We extend this framework to mining ratio rules from multiple source data streams which is a novel and challenging p...

متن کامل

Data Mining in the Presence of Quantitatively and Qualitatively Diverse Information

The work under this grant has established several concepts for working with diverse data. As a first step, abstractions have been developed that appropriately represent the richness of data types such as sequence and graph data, and their combination with conventional data types such as Boolean (or item) data. As a second step towards integrating diverse data, techniques have been developed for...

متن کامل

Heavy metal pollution and identification of their sources in soil over Sangan iron-mining region, NE Iran

The aim of this study was to determine the extent of metal pollutions and the identification of their major sources in the vicinity of the Sangan iron mine occurring in NE Iran. Soil samples were collected from the vicinity of the mine site and analyzed for heavy metals. In addition, the chemical speciation of these metals was investigated by means of the sequential extraction procedure. The st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015